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1. 


Introduction. 


The  literature  on  sufficiency  is  extensive  and  it  is  not  the  aim  of 
the  present  paper  to  give  a  complete  survey  of  this.  We  shall  discuss 
the  relationship  between  a  number  of  notions  introduced  by  various  authors 
with  different  problems  in  mind,  but  all  of  them  being  of  the  same  nature 
as  sufficiency.  Some  of  these  notions  were  defined  in  terms  of  subfields 
of  abstract  probability  spaces,  but  we  shall  restate  all  definitions  in 
terms  of  statistics  and  discrete  probabilities  as  our  interest  is  directed 
more  towards  structural  properties  than  technical  ones. 

2.  Sufficiency,  adequacy  and  summarizing  statistics. 

In  the  present  section  we  shall  investigate  three  different  properties 
of  statistics  with  the  same  basic  idea,  namely  that  they  express  the 
intuitive  statement  that  a  statistic  contains  all  "relevant"  information. 

First  the  classical  notion  of  a  sufficient  statistic  as  introduced 
by  Fisher  (1920).  We  shall  define  it  the  following  way: 

Let  X  be  a  random  variable  on  a  discrete,  at  most  denumerable  space 
E  and  t  a  mapping  from  E  into  another  discrete  space  F.  Let 
be  a  family  of  probabilities  on  E  and  let  Y  =  t(X) . 

Definition  2.1.  t  is  said  to  be  sufficient  for  9  if  there  is  a  fixed 
non  negative  real  function  q>  on  E  X  F  so  that  for  all  P  e  9  and  all 
x  e  E: 

PlY  =  y]  >  0 

=?>•  P(X  =  x|y  =  y]  =  cp(x,y). 
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A  slightly  stronger  notion  was  introduced  by  Freedman  (1962)  with 
the  pure  probabilistic  motivation  of  generalizing  de  Finettis  theorem 
for  exchangeable  0-1  random  variables.  The  notion  is  however  closely 
related  to  sufficiency,  as  we  shall  soon  see*  Again  let  X  he  a  random 
variable  on  a  discrete  space  E  and  t  a  mapping  from  E  to  a  discrete 
space  F. 

Definition  2.2.  A  probability  measure  P  on  E  is  said  to  be  summarized 
by  t  if  for  all  x,  x*  e  E 

t(x)  =  t(x'  )  =£»  P{X  =  x)  =  P{X  =  x'  }  . 

In  contrast  to  definition  2.1,  we  are  not  dealing  with  a  family  of 
probabilities  but  only  with  one  probability  measure.  To  be  able  to  see 
the  relation  between  a  sufficient  statistic  and  a  summarizing  statistic 
we  have  to  define  a  summarizing  statistic  for  a  family  of  probabilities 
S*  •  In  the  previous  notation  we  define 

Definition  2 .3 •  t  is  said  to  summarize  P  if  all  P  e  P  are  summarized 
by  t . 

Remark:  Note  that  the  term  " summarizing"  is  essentially  related  to 

discrete  random  variables  as  opposed  to  other  concepts  dealt  with  in  the 
present  paper. 

This  is  stronger  than  sufficiency: 

Proposition  2.1.  If  (p  is  a  family  of  probability  measures  on  E  and 
t:  E  -» F  summarizes  fp  ,  then  t  is  sufficient  for  !P  . 
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Proof.  We  shall  just  specify  the  function  cp  in  definition  1. 
Define  cp  as 


f  0  if  t(x)  ^ 


(1) 


<P(x,y)  =  / 


N(y) 


for  t(x)  =  y  , 


where  W(y)  is  the  total  number  of  x’ s  so  that  t(x) 
If  P(Y  =  yj  >  0,  we  have 


P{X  =  x|Y  =  yj 


(2) 


,  Ptx  f  X  A  y  ■=  yj  ,  x  M 

t_1(y) 


P{X  = 


x 


P(Y  -  y] 


I  P(X 
zet_1(y) 


where  is  the  indicator  function  of  the  set  A: 


XA(x) 


1  if  x  e  A 
0  if  x  |  A 


If  t(x)  =  y,  we  have 


(3) 


P{X  -  z]  =  P{X  =  xj  for  all  zet  ^(y)  , 


since  t  was  a  summarizing  statistic.  Hence 


Y,  0  < 


i _ 

=  z) 


p{x  =  z] 


(4) 


So 

(5) 


P{Y  =  y]  =  I 

zet  X(y) 

=  P(X  =  x]  •  N(y)  • 

P(Y  =  y]  >  0  =>  K(y)  <  » 


and 

P(x  .  x|y  -  y)  -  *t_i(y)U)  •  fffi/HMiyj 

(6)  y 

which  was  to'  be  proved. 

So  the  notion  of  a  summarizing  statistic  is  stronger  than  that  of 
a  sufficient  statistic  in  the  sense  that  not  only  is  the  conditional 
distribution  of  X  given  t(X)  supposed  to  be  known,  but  this 
distribution  is  supposed  to  have  the  specific  form  (6),  i.e.  uniform 
on  the  set  t  ^(tCx)). 

Barndorff -Nielsen  and  Skibinsky  (1963)  considered  the  problem  of 
how  much  one  could  reduce  a  data  set  and  still  have  all  relevant  infor¬ 
mation  for  the  prediction  of  an  unobserved  random  variable  when  the  joint 
distribution  of  the  data  and  the  unobserved  random  variable  was  completely 
known  and  defined  the  notion  adequacy .  This  definition  was  later  extended 
to  the  case,  where  this  joint  distribution  was  only  known  to  be  a  member 
of  a  specified  family  of  distributions  by  Skibinsky  (1967)* 
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Let  X  and  Z  be  random  variables  on  discrete,  at  most  denumerable 
spaces  E  and  G.  Let  be  a  family  of  distributions  on  E  X  G  and 
let  iPjj.  denote  the  induced  family  of  marginal  distributions  of  X. 

Let  t  be  a  mapping  from  E  to  an  at  most  denumerable  space  F. 

Definition  2.4.  t  is  said  to  be  adequate  for  Z  if 

i)  t  is  sufficient  for  fP 

E 

ii)  for  all  P  e  P:  P{x  =  xj  >  0 

=>  P{Z  =  z|x  =  x]  =  P(Z  =  z | t(X)  =  t(x))  . 


This  definition  suggests  that  the  classical  notion  of  sufficiency 
is  not  satisfactory  to  the  theory  of  statistical  inference  in  stochastic 
processes  as  the  prediction  of  unobserved  random  variables  (the  future 
of  the  process  observed)  in  most  cases  will  be  relevant.  In  the  next 
section  we  shall  consider  some  extra  conditions  that  have  been 
imposed  on  a  sequence  of  statistics  by  various  authors. 


3 •  Sequences  of  statistics . 


In  the  present  section  we  shall  let  X-^Xg,  ...  be  a  sequence  of 
random  variables  on  discrete  at  most  denumerable  spaces  E  ,E2> ...  and 


let  P 

p(n) 


be  a  family  of  probability  measures  on  E^  X  Eg  X  • • •  .  Let 
denote  the  family  of  marginal  distributions  of  X^, .  ..,X 


induced  by  !P  .  We  shall  consider  a  sequence  t^,tg, ...  of 


mappings . 


t  :  E,  X  ...  XE  ->  F  , 
n  1  n  n 


where  F  are  discrete,  at  most  denumerable,  and  let  Y  =  t  (X, ,...,X  ) 
n  n  n  1  n 
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Bahadur  ( 195*0  introduced  the  term  of  a  sufficient  and  transitive 
sequence  of  statistics  in  connection  with  sequential  decision  theory, 
which  can  be  stated  as  follows . 

Definition  3*1*  The  sequence  t^,t2,...  is  said  to  be  sufficient  and 

transitive  if  for  all  n,  t  is  adequate  for  Y 

n  n+1 

In  other  words,  t^,^,  ...  is  sufficient  and  transitive  iff  it  at 
each  step  n  contains  all  information  relevant  for  the  prediction  of 
the  value  of  the  next  statistic.  This  is  related  to  but  different  from 
the  notion  of  a  totally  sufficient  statistic,  introduced  by  Lauritzen 
(1972)  in  terms  of  abstract  measure  spaces  and  restated  in  terms  of 
discrete  probability  spaces  in  Lauritzen  (1974). 

Definition  5  »2 .  tfi  is  said  to  be  totally  sufficient  if  it  is  adequate 

f°r  -^n+1'  *  *  ' }  ^n+k  •^OI"  aH  k  =  1, 2,  . . .  . 

That  the  two  notions  are  different  can  be  seen  by  the  following 
example: 

Example  1.  Let  X^,X2  be  independent  Poisson  distributed  with  mean 
X  >  0,  and  let  Xn  =  Xg  +  Z±  +•  •  •+  Zn_2  for  n  >  3,  where  Z^Zg,  . . . 
are  independent  of  X^,Xg  and  independent  identically  Poisson  distributed 
with  mean  1.  The  sequence  tx,t2, . . .  of  mappings  defined  by 

t^(x)  =  x  and 

(7) 

■fcn(Xi# • • *,xn)  =  +  x2  for  n  >  2 

is  sufficient  and  transitive,  whereas  e.g.  t2  is  not  totally  sufficient 
as  (X^,X2)  and  X^  =  X2  +  X^  are  not  conditionally  independent. 
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On  the  other  hand,  the  sequence  s^Sg,  ...  defined  as 


(8) 


s1(x)  =  X 

S2^xq-’X2^  —  ^X1*X2^ 

s3(xi,x2,x5)  =  (Xl,x2,x5) 

—  (x]_,Xg,x4) 

s5(xi;x2,x3,xvx5)  =  ( x^ ,  Xg ,  x^ ,  x^  )  and 

Bn(xi>...,xn)  =  (xi'x2’xn)  for 


is  totally  sufficient  but  not  sufficient  and  transitive,  because 
(X1,X2,X3,X4)  and  s3(X^, . . . ,X^ )  =  (X^X^X^^X^)  are  not  conditionally 
independent  given  a^X^,  . .  .,X^)  =  (X^X^X^). 

If  one  wants  to  insure  a  sequence  of  totally  sufficient  statistics 
to  be  sufficient  and  transitive,  an  extra  condition  has  to  be  imposed. 
The  following  algebraic  property  of  a  sequence  of  statistics  is  a 
slight  weakening  of  "S-structure"  as  defined  by  Freedman  (1962). 

Definition  $ -3  *  t^, tg, . . .  is  said  to  have  Y  -structure  if  for  all 

m,n 


tn(xi^  •  •  ^x„)  -  t^Cy., , . .  .,y„) 


n  nw  1 


n 


>  "VmW’  ’  *  VXn+l'  •  ‘  ‘'VtoP  “  tn+m'Jl>  ' '  ’’yn’Xn+l’  *  *  ''Xn+J 


In  other  words,  t^,tg, ...  has  £  -structure  if  for  all  m  and  n, 
there  is  a  mapping  i|r 


nm 
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i|r  :  F  X  E  , X*  * -X  E  ,  -»  F  , 

run  n  n+1  n+m  n+m 


so  that 


t  ,  (X-,j«..^X  )  —  (t  (x.;...,X  ) )  X  ,-.J...*X  ,  ) 

n+m  1  n+m'  nm  n  1'  n  n+1'  '  n+m' 


The  term  Y.  -structure  is  chosen  to  emphasise  that  t  +m  is  a  "generalized 

sum"  of  t  (x_,,...,x  )  and  (x  , ...... x  ,  ),  as  is  e.g.  the  case  in 

n  I  n  n+1  n+m 

classical  exponential  families,  where  we  have 

t  (x^.-jX  )  =  t(x, )  +•••+  t(x  ) 
n  I  n  1  n 

for  t  being  some  function  from  E  into  k-dimensional  Euclidean  space. 

We  can  now  show  the  following  result: 

Proposition  g.l.  If  for  any  n,  t  is  totally  sufficient  and  if 

tf,t2,  ...  has  V  -structure,  then  is  sufficient  and  transitive. 

Proof,  t  is  clearly  sufficient  for  for  all  n.  We  have  to  show 

that  Yn+1  and  are  conditionally  independent  given  Yn«  We 

get 


(9) 


P(Yn+l  "  y^Xl  "  xl*  •  •  ••»Xn  -  XJ 
P{tn+l^l’"',VXn+l^  =  y^Xl  =  xl'"*'Xn  =  XJ 


~  F[fn,n+l(tn(Xl’“''Xn)'Xn+l)  ~  y^Xl  “  Xl'”*'Xn  Xn}  ' 


where  ^  . ,  satisfies 

n,n+l 


(iO)  =  *a,n+l(tn(xl’---’xn)'xn+l) 
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aoad  X^,  ...,X  are  conditionally 


As  t  is  totally  sufficient.  X 

ft  n+1 

independent  given  Y  ,  and  we  get  from  (9)  that 


P(Yn+1  yiX1  x.,,...,Xn  -  xn) 


n+1  •'/"l  “l' 

(U)  -  F<Wtn(xl'-"'*n)’W  “  y|¥n  =  . ■„» 

*  P^Yn+l  =  yhn  =  VXj.,  ...,xn)J  , 


which  was  to  be  proved. 

Martin-Lof  (1973)  defined  an  algebraic  consistency  condition  for  a 
sequence  of  statistics  which  is  slightly  weaker  than  structure •  We 
shall  call  it  Y_  ^-structure: 


Definition  3  ^±>^2’'"  is  said  ’to  kav®  T  *  -structure  if  for  all 

n,m  there  is  a  function 


N  :  F  X  F  -»  {0,1,  .  . .} 
n,m  n  m  }  ’  1 


so  that  for  all  (x^  •••>xn)  e  E^.  X*  •  *X.  E 


$  t  (xn+-i  >  ’  *  •■’xri4_ri)  e  E  E  ,  :  t  (x  ,  . . . ,x  )  —  y] 

n+1  n+m  n+1  n+m  n+m  1  n+m'  J 


W  ( t  ( X..  ,  •  •  • , x  ),y)  . 

n;m'  n'  1'  ’  n"J/ 


Remark:  The  discreteness  of  the  sample  space  is  also  essential  here, 
as  the  number  of  points  corresponding  to  given  values  of  the  statistics 
occur  in  the  definition  in  a  fundamental  manner. 

It  is  immediate  that  we  have 

£r?P°£ij.j-on  3-2.  If  t1,tg,  ...  has  X-  structure,  then  it  has  J*- structure . 
Proof .  Trivial . 
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Y"  * 

L  -structure  becomes  an  important  property  when  the  conditional 
distributions  given  the  statistics  are  determined  by  the  numbers 

(!2)  N  (y)  =  #{(x  ,  ...,x  )  e  E  X***X  E  :  t  (xn,....,x  )  =  y} 
n  -L  n  1  n  n  1 ’  n' 

and  we  have 

Proposition  3 -3 •  If  for  all  n,  9{n)  is  summarized  by  t  and  if 
...  has  >  -structure,  then  t  ,tg,  ...  is  sufficient  and 
transitive. 

Proof •  We  already  know  that  tn  is  sufficient  for  from  proposition 

2.1^  As  in  proposition  3*1  it  remains  to  he  shown  that  Y  and 

n+1 

Xl'’"',Xn  are  condifionally  independent  given  Y  .  We  have  for  all 
n 


P{X1  -  x1,...,Xn  -  xn) 


(13) 


-  P{X1  -  x1#...,Xn  -  xjYn  -  yx^.-.^xj)  •  P{Yn  =  th(x1,...,xn)] 


P(Y  =  t  (x, ,  . • .,x  )) 
n  nv  1'  n 

N  (t  (x. ,  . . .,x  ) )  ’ 

n  n'  l'  ’  n' ' 


according  to  proposition  2.1. 


Furthermore 


PtYn+l  y  A  X  -  X;L, . .  .,xn  -  xnJ 


l  p{Yn+i  ;  y] 

x:  (X;L,  ...jx^xjet'^y)  ~\+l^~ 

P(fn+n  =  y) 

n+1  ('tr1(Xl  >  *  ’  ) 


n,n+l'“n'ri/"*’V'  *  F+1(y) 
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Hence  from  (13)  and  (l4)  we  get 


(15) 


P*-Yn+1  y^Xl  “  xi}  “  xn)  = 

P^Yn+l  =  •  •  •■>xn)) 

®n+l^y^P^Yn  ^  ^n^xl^  *  *  '■>Xn^  ^n,n+l^n^xl'  " 


•  .,x  ) ) 
n ' ' 


Since  this  only  depends  on  x±) . . . ,x^  through  tjx^  . .  .,xn),  we  must 
have 


(16) 


P^Yn+l  y^Xl  xi'  “  XJ 

=  PtYn+l  =  y^Yn  =  VXl'  *,Xn^  * 


which  was  to  be  proved. 

If  we  assume  -structure  instead  of  £  -structure,  we  have  the 
even  stronger  result: 


Proposition  5.4.  If  for  all  n,  (p  is  summarized  by  t  and  if 

- - -  - - — n  - - 

.  ^structure,  then  t^  is  totally  sufficient  for  all 

n. 


Proof.  As  in  the  previous  proof,  we  only  have  to  establish  that 

Xl'“‘-’Xn  and  Xn+1'  * '  *,Xn+k  are  conditionally  independent  given  Yn 
for  all  n  and  k.  We  have 
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P{X1  =  x1# 


’ 3 Xn+k  =  Xn+k|Yn  =  y] 


Y  -|  (x.. ,  . .  .,X  )  •  P(Y  „  =  t  , ,  (x. ,  . 

A-, -1/  \  1  ’  n'  n+k  n+k  1 

K  (y) 


•  •  >  x  .  ) } 
n+k 


(IT) 


X  (x  ,  .  . .  jX  )  •  P(Y  =  i|r  .  (y,x  . .  .,x  )] 

^.-1^  j  1  n  n+k  n,k  n+1  n+k 


n 


P(Y  =  y)  N  , ,  (t  ,  (y,x  , . . .,x  )) 

n  J  n+k  n,k w  n+1  n+k 


where  \|r  ,  is  given  by 


(18) 


*n,k(tnUl' 


•,X  );X 
n 


n+1'’  ' ' '  ’  Xn+k^ 


t  (xn ,  . . 
n+k  1 


n+k 


But  for  fixed,  y,  the  expression  (17)  is  a  product  in  (x^.-.^x^)  and 
(xn+i,  .",xn+k)*  Hence  ...,Xn  and  Xn+1>  . .  .  ,Xn+k  are  conditionally 
independent  given  Yn  =  y ,  which  was  to  be  proved. 

Apparently  the  property  of  being  summarizing  with  ]jT  -  structure 
is  very  strong.  Another  way  of  strengthening  total  sufficiency  is  to 
assume  minimality .  As  in  Lauritzen  (197*0  we  say  that  t  is  minimal 
totally  sufficient  if  it  is  a  f unction  of  any  other  totally  sufficient 
statistic.  We  then  have 

Proposition  $ .9 .  If  for  all  n,  t  is  minimal  totally  sufficient  then 
tfjtg, ...  has  7  -  structure . 

Proof .  The  result  follows  directly  from  corollary  1  of  Lauritzen  ( 197*0  • 
If  we  include  the  notion  of  a  minimal  sufficient  statistic  in  our 
considerations,  we  can  " summarize"  the  results  in  the  following  diagram: 
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minimal  totally 
sufficient 


X 


totally 

sufficient 


totally  sufficient 
with  Y_  ~  structure 


\ 


sufficient 
and  transitive 


t 


summarizing 
with  Y.  ~  structure 


summarizing 
with  structure 


minimal 

sufficient 


sufficient 

/N 


summarizing 


(The  implications  that  are  not  proved  in  the  previous  are  trivial). 

At  this  point  the  author  feels  uncomfortable  as  a  statistician.  Is 
it  really  so  that  all  these  various  notions  are  relevant?  It  is  certainly 
true  that  in  many  examples  at  least  some  of  the  notions  coincide.  So  far 
we  have  dealt  with  being  an  arbitrary  family  of  probability  measures 
which  in  some  sense  is  unreasonable  from  a  statistical  point  of  view. 

In  the  last  section  we  shall  impose  regularity  conditions  on  (p  and  see 
how  many  of  the  implications  in  the  diagram  turn  into  equivalences. 

Independence  and  universality. 

Let  us  assume  that  for  all  Pe  ^  X^,  ...  are  independent 

random  variables .  It  is  then  immediate  that  total  sufficiency  and 
sufficiency  coincide  and  the  same  is  of  course  true  for  minimal  total 
sufficiency  and  minimal  sufficiency.  Hence  it  appears  from  the  diagram 
that  e.g.  minimal  sufficiency"  implies  everything  but  "summarizing" 
and  is  thus  a  very  strong  property. 
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Barndorff -Nielsen  (1973)  discussed  the  notion  of  a  universal  family 
of  probability  measures  in  connection  with  the  notion  of  M-ancillarity . 
Let  X  be  a  random  variable  on  a  discrete,  at  most  denumerable  set  E 
and  (P  a  family  of  probability  measures  on  E. 

Definition  4.1.  {p  is  said  to  be  universal  if  for  all  x  e  E  there  is 
a  P  e  IP  so  that 

P{X  =  x)  >  P{X  =  y) 


for  all  y  e  E. 

The  following  result,  given  in  e.g.  Barndorff -Nielsen  (1973)  shows 
a  relation  to  the  discussion  in  the  preceding  sections: 

Proposition  4.1.  If  (P  is  universal  and  t  is  sufficient  for  (p  , 
then  t  summarizes  (P  . 

Proof .  The  proof  is  exactly  as  in  Barndorff -Nielsen  (1973),  theorem  2.1. 
Although  E  is  assumed  to  be  finite  in  that  paper,  this  assumption  is 
irrelevant  for  the  validity  of  the  proof . 

Hence,  if  in  the  previous  section  is  assumed  to  be  universal 

for  all  n,  "sufficient"  implies  "summarizing"  and  "totally  sufficient 
with  Y.~  structure"  implies  "summarizing  with  £-  structure"  •  Hence  from 
the  diagram  it  appears  that  "minimal  totally  sufficient"  implies  every¬ 
thing  but  minimal  sufficient.  Finally,  If  X^,X^, • • •  are  all  assumed 
to  be  independent  and  at  the  same  time  assumed  to  be  universal 

for  all  n,  "minimal  sufficient"  implies  everything  else. 
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